# Low Power CMOS Pass Logic 4-2 Compressor for High-Speed Multiplication

# D. Radhakrishnan

Department of Electrical and Computer Engineering State University of New York, 75 South Manheim Blvd. New Paltz, NY 12561

Abstract- A novel CMOS 4-2 compressor using pass logic is presented in this paper. An XOR-XNOR combination gate is used to build the circuit while totally eliminating the use of inverters. The total power dissipation has been cut down to a minimum while providing the full output voltage swing at all nodes in the circuit. Furthermore, the complete circuit is implemented with a bare minimum of 28 transistors.

#### I. INTRODUCTION

Enhancing the performance of floating point operations is indispensable for current high performance microprocessors. In this regard high speed multiplication is becoming one of the key operations in RISCs, real-time image and signal real-time speech recognition, graphics processing, accelerators and so on, due to the increasing demand from applications. Because of their multimedia computationally intensive processing requirements in these applications, low-power dissipation is becoming a primary design goal in many portable video, audio and computing systems[1]. The majority of power consumption in CMOS circuits is due to the dynamic power dissipated in charging and discharging of the transistor parasitic capacitances. This demands for design solutions with minimum number of signal changes per operation cycle (switching activity). A number of studies on high speed and/or low power multipliers have been reported in the literature [2-6].

A fast array multiplier can be divided into 3 parts: a Booth encoder, a partial product summation tree and a final adder. The partial product summation tree is responsible for a significant portion of the total multiplication delay and it is well known that a 4-2 compressor can be used to construct the tournament adder with a regularly structured Wallace tree, giving low complexity [2].

#### II. 4-2 COMPRESSOR DESIGNS

A 4-2 compressor consists of five inputs and three outputs and can be implemented with two stages of full-adders (FA) connected in series as shown in Fig. 1. Various approaches have been proposed in the literature to improve their speed. In [3], a logic circuit optimization was used to shorten the critical path as shown in Fig. 2. A direct implementation of the 4-2 compressor shown in Fig. 2 in pseudo-CMOS logic is given in [4]. Their design needs seven transistors to implement each XOR gate. Furthermore, the large number of inverters used in their design increases the switching activity and hence the power consumption too. In [5], the full adders are implemented using MUX and XOR gates in complementary pass transistor logic (CPL) and connected in series to form the compressor as in Fig. 1.

# A.P. Preethy

School of Computer Engineering Nanyang Technological University Nanyang Ave., Singapore 639798



Fig. 1. 4-2 Compressor composed of two FAs.



Fig. 2. 4-2 Compressor.

A multiplier architecture with XORs as building blocks is presented in [6] which exhibits a 33% performance improvement over that of a Wallace tree multiplier. They used a novel design of a 4-2 compressor based on a modified set of equations for the sum and carry outputs of the compressor as:

$$\begin{split} \mathbf{S} &= \mathbf{X}_1 \oplus \mathbf{X}_2 \oplus \mathbf{X}_3 \oplus \mathbf{X}_4 \oplus \mathbf{C}_{\text{in}} \\ \mathbf{C} &= \big( \mathbf{X}_1 \oplus \mathbf{X}_2 \oplus \mathbf{X}_3 \oplus \mathbf{X}_4 \big) \mathbf{C}_{\text{in}} + \big( \overline{\mathbf{X}_1 \oplus \mathbf{X}_2 \oplus \mathbf{X}_3 \oplus \mathbf{X}_4} \big) \mathbf{X}_4 \\ \mathbf{C}_{\text{out}} &= \big( \mathbf{X}_1 \oplus \mathbf{X}_2 \big) \mathbf{X}_3 + \big( \overline{\mathbf{X}_1 \oplus \mathbf{X}_2} \big) \mathbf{X}_1 \end{split}$$

An equivalent gate logic realization using XORs and MUXs is shown in Fig. 3. Even though their CPL implementation is very efficient for realizing MUXs and XORs, they need pullup circuits and inverters to minimize the reduced-swing switching as well as weak signal transmission. A similar 4-2 compressor circuit in complementary pass transistor logic is presented in [7] that uses a minimum of 40 transistors. CPL style design uses small input loads, provides good output driving capability due to their output inverters, and has a fast differential stage. But this differential stage, on the otherhand, leads to considerably larger short-circuit currents. Furthermore, the substantial number of nodes in the circuit accounts for increased switching activity.

A purely MUX based implementation of a 4-2 compressor using CMOS pass transistors is given in [8]. Their implementation needs CMOS inverters for inverting the input bits and the outputs of some intermediate MUXs. The inverters at the input have the maximum switching activity compared to all others nodes in the circuit and hence the power dissipation of this circuit is increased.

A double pass transistor logic (DPL) implementation of the gate logic structure shown in Fig. 3 has been shown to exhibit lower power consumption and higher speed performance compared to earlier designs due to its reduction of the internal load capacitances in the critical path [9]. But the XOR/XNOR gates used in DPL uses a total of 8 transistors and also need input inverters. This increases the overall transistor count to 58 and also increases the power dissipation due to the high switching activity in the input inverters.

Recently a number of low-power low-voltage 4-2 compressors were presented in [10]. They are mostly based on BICMOS technology and were implemented using DPL configuration. All implementations use similar structural configurations as given in Fig. 2. This increases the overall delay. Also the large number of inverters used in their design increases the power dissipation.

From the above, it is obvious that for high speed and low power, pass logic implementations are the best choice. The use of inverters in the circuit must also be minimized while aiming for low power [11].



Fig. 3. 4-2 Compressor using XORs and MUXs.

To satisfy the above requirements, a novel 4-2 compressor based on pass logic is presented in this paper, that completely eliminates the use of inverters, while using the smallest number of transistors (only 28) and at the same time providing fully restored output voltages on all nodes in the circuit. This implementation uses the recently developed pass transistor XOR/XNOR gates and full adder circuits [12].

# III. PASS LOGIC XOR/XNOR GATES AND FULL ADDER CELLS

A number of full adder circuits have been developed recently with the aim of low power and high speed. In this regard, a structured approach for designing low power adder cells were presented in [11]. This is done by partitioning the full adder cell into three independent submodules as shown in Fig. 4.

The sum and carry expressions are given by:  $S = H \oplus C_{in}$ 

and  $C_{out} = H\,A + HC_{in}$ , where  $H = A \oplus B$ . Six independent designs for Module 1 (XOR, XNOR combination), four for Module 2 and one for Module 3 were given in [11]. A total of 25 different choices of adders (including the conventional CMOS adder) by taking all possible combinations of the above were simulated and tabulated in the order of their normalized average power consumption.

A further minimized version of the XOR/XNOR gate presented in [11] and its associated full adder circuit is given in [12] and is reproduced here in Fig. 5. This new design makes use of pass logic design techniques presented in [13,14]. The pass logic design equations for the sum and carry outputs of a full adder are given as:

$$S = \overline{H(C_{in})} + H\overline{(C_{in})}$$

$$C = \overline{H(A)} + H\overline{(C_{in})}$$
where  $H = \overline{A(B)} + A\overline{(B)}$ 

The XOR-XNOR combination gate in Fig. 5 uses only 6 transistors while providing fully restored outputs at its nodes. The power consumption of this gate is the lowest reported so far

#### IV. NEW 4-2 COMPRESSOR USING PASS LOGIC

The new 4-2 compressor uses the design equations given in [6] and is based on the gate level realization shown in Fig. 3. These equations are rewritten in pass logic form as:

$$C_{out} = \boxed{X_1 \oplus X_2} (X_1) + [X_1 \oplus X_2] (X_3)$$

$$C_{in} \quad Full \ Adder$$

$$A \quad Module \ 1$$

$$B \quad Module \ 3$$

$$C_{out}$$

Fig. 4. Building modules of the full adder cell.



Fig. 5. XOR/XNOR gate and the full adder.

$$\begin{split} \mathbf{S} &= \overline{\mathbf{H}_{3}} \left( \mathbf{C}_{\mathsf{in}} \right) + \mathbf{C}_{\mathsf{in}} \left( \overline{\mathbf{H}_{3}} \right) + \bar{\mathbf{C}_{\mathsf{in}}} \left( \mathbf{H}_{3} \right) \\ \mathbf{C} &= \overline{\mathbf{H}_{3}} \left( \mathbf{X}_{4} \right) + \mathbf{H}_{3} \left( \mathbf{C}_{\mathsf{in}} \right) \end{split}$$

where  $H_3 = X_1 \oplus X_2 \oplus X_3 \oplus X_4$ 

Due to the presence of both XOR and XNOR outputs the carry generation MUXs do not need any extra inverters and none of the inputs need any inverters. The complete circuit diagram is shown in Fig. 6. Moreover, there are no inverters in the circuit and the total number of transistors used in Figure 6 is only 28 as against 40 reported so far.



Fig. 6. New CMOS pass logic 4-2 compressor.

### V. CONCLUSIONS

The 4-2 compressor presented in this paper makes use of the recently developed low power XOR-XNOR combination gate based full adder circuits. In this new design, inverters are totally eliminated from the circuit. The total number of transistors used is only 28 where as the lowest reported in the literature is 40. Furthermore, it provides full voltage swing at all nodes in the circuit.

#### REFERENCES

- G.M. Blair, "Designing Low Power CMOS", IEEE Electronics and Communication Engineering Journal, vol. 6, pp. 229-236,1994.
- [2] C.S. Wallace, "A Suggestion for a Fast Multiplier," IEEE Trans. Electron Comput., vol. 13, pp. 14-17, Feb. 1964.
- [3] M. Nagamatsu, S. Tanaka, J. Mori, T. Noguchi and K. Hatanaka, "A 15nS 32X32-bit CMOS Multiplier with an Improved Parallel Structure," Proc. IEEE Custom Integrated Circuits Conf., pp. 10.3.1-10.3.4, 1989.
- [4] J. Mori, M. Nagamatsu, M. Hirano, S. Tanaka, M. Noda, Y. Toyoshima, K. Hashimoto, H. Hayashida and K. Maeguchi, "A 10-ns 54X54-b Parallel Structured Full Array Multiplier with 0.5um CMOS Technology," IEEE J. Solid-State Circuits, vol. 26, pp. 600-606, April 1991.
- [5] K. Yano, T. Yamanaka, T. Nishida, M. Saito, H. Shimohigashi and A. Shimizu, "A 3.8-ns CMOS 16X16-b Multiplier Using Complementary Pass-Transistor Logic," IEEE J. Solid-State Circuits, vol. 25, pp. 388-395, April 1990.
- Circuits, vol. 25, pp. 388-395, April 1990.

  [6] D. Ghosh, S.K. Nandy and K. Parthasarathy, "TWTXBB: A Low Latency, High Throughput Multiplier Architecture Using a New 4-2 Compressor," 7th Intl. Conf. on VLSI Design, Calcutta, India, pp. 77-82, Jan. 1994.
- [7] Y. Kanie, Y. Kubota, S. Toyoyama, Y. Iwase and S. Tsuchimoto, " 4-2 Compressor with Complementary Pass-Transistor Logic," IEICE Trans. Electron., vol. E77-C, no. 4, pp. 647-649, April 1994.
- [8] N. Ohkubo, M. Suzuki, T. Shinbo, T. Yamanaka, A. Shimizu, K. Sasaki and Y. Nakagome, "A 4.4-ns CMOS 54X54-b Multiplier Using Pass-transistor Multiplier," Proc. IEEE Custom Integrated Circuits Conf., pp. 26.4.1-26.4.4, 1994.
- [9] S. F. Hsiao, M.R. Jiang and J.S. Yeh, "Design of high-speed low-power 3-2 counter and 4-2 compressor for fast multipliers, "Electronics Letters, vol. 34, no. 4, pp. 341-342, Feb. 1998.
- [10] M. Margala and N.G. Durdle, "Low-Power Low-Voltage 4-2 Compressors for VLSI Applications," Proc. Workshop on Low Power Design, 1999.
- [11] A.M. Shams and M.A. Bayoumi, "A Structured Approach for Designing Low Power Adders," Proc. 31st ASILOMAR Conf. on Signals, Systems and Computers, vol. 1, pp. 751-761, 1998.
- [12] D. Radhakrishnan, "Low Voltage CMOS Full Adder Cells," Electronics Letters, vol. 35, no. 21, pp. 1792-1794, Oct. 1999.
- [13] D. Radhakrishnan, S.R. Whitaker, and G.K. Maki, "Formal Design Procedures for Pass Transistor Switching Circuits", IEEE Journal of Solid-State Circuits, vol. 20, pp. 531-536, April 1985.
- [14] D. Radhakrishnan, "Design of CMOS Circuits", IEE Proceedings - Circuits Devices and Systems, vol. 138, no. 1, pp. 83-90, Feb. 1991.